Under-Approximating Expected Total Rewards in POMDPs

نویسندگان

چکیده

Abstract We consider the problem: is optimal expected total reward to reach a goal state in partially observable Markov decision process (POMDP) below given threshold? tackle this—generally undecidable—problem by computing under-approximations on these rewards. This done abstracting finite unfoldings of infinite belief MDP POMDP. The key issue find suitable under-approximation value function. provide two techniques: simple (cut-off) technique that uses good policy POMDP, and more advanced (belief clipping) minimal shifts probabilities between beliefs. use mixed-integer linear programming (MILP) such probability experimentally show our techniques scale quite well while providing tight lower bounds reward.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous Time Markov Decision Processes with Expected Discounted Total Rewards

Abstract. This paper discusses continuous time Markov decision processes with criterion of expected discounted total rewards, where the state space is countable, the reward rate function is extended real-valued and the discount rate is a real number. Under necessary conditions that the model is well defined, the state space is partitioned into three subsets, on which the optimal value function ...

متن کامل

Counterexamples for Expected Rewards

The computation of counterexamples for probabilistic systems has gained a lot of attention during the last few years. All of the proposed methods focus on the situation when the probabilities of certain events are too high. In this paper we investigate how counterexamples for properties concerning expected costs (or, equivalently, expected rewards) of events can be computed. We propose methods ...

متن کامل

Genetic Algorithms for Approximating Solutions to POMDPs

We use genetic algorithms (GAs) to nd good nite horizon policies for POMDPs, where the search is limited to policies with a xed nite amount of policy memory. Initial results were presented in (Lusena et al. 1999) with one GA. In this paper, diierent cross-over and mutation rates are compared. Initializing the population of the genetic algorithm is done using smaller genetic algorithms. The sele...

متن کامل

POMDPs under Probabilistic Semantics

We consider partially observable Markov decision processes (POMDPs) with limitaverage payoff, where a reward value in the interval [0, 1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold λ1 ∈ (...

متن کامل

Policy Iteration Algorithms for DEC-POMDPs with Discounted Rewards

Over the past seven years, researchers have been trying to find algorithms for the decentralized control of multiple agent under uncertainty. Unfortunately, most of the standard methods are unable to scale to real-world-size domains. In this paper, we come up with promising new theoretical insights to build scalable algorithms with provable error bounds. In the light of the new theoretical insi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-99527-0_2